Duplikaterkennung in der Graph-Processing-Platform GRADOOP

نویسنده

Florian Pretzsch

چکیده

Die zunehmende Bedeutung von Graphdaten im Kontext von Big Data erfordert wirksame Verfahren zur Erkennung von Duplikaten, d. h. Knoten, welche das selbe Realweltobjekt repräsentieren. Dieser Beitrag stellt die Integration von Techniken zur Duplikaterkennung innerhalb des Graphverarbeitungs-Frameworks GRADOOP vor. Dazu werden dem GRADOOP-Framework neue Operatoren zur Duplikaterkennung hinzugefügt, die u. a. in der Lage sind, Ähnlichkeiten zwischen Knoten von einem oder mehreren Graphen zu bestimmen und ermittelte Duplikate als neue Kanten zu repräsentieren. Das vorgestellte Konzept wurde prototypisch implementiert und evaluiert.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GRADOOP: Scalable Graph Data Management and Analytics with Hadoop

Many Big Data applications in business and science require the management and analysis of huge amounts of graph data. Previous approaches for graph analytics such as graph databases and parallel graph processing systems (e.g., Pregel) either lack sufficient scalability or flexibility and expres-siveness. We are therefore developing a new end-to-end approach for graph data management and analysi...

متن کامل

Scalable graph analytics with GRADOOP

Many Big Data applications in business and science require the management and analysis of huge amounts of graph data. Previous approaches for graph analytics such as graph databases and parallel graph processing systems (e.g., Pregel) either lack sufficient scalability or flexibility and expressiveness. We are therefore developing a new end-to-end approach for graph data management and analysis...

متن کامل

The Big Picture: Understanding large-scale graphs using Graph Grouping with Gradoop

Graph grouping supports data analysts in decision making based on the characteristics of large-scale, heterogeneous networks containing millions or even billions of vertices and edges. We demonstrate graph grouping with GRADOOP, a scalable system supporting declarative programs composed from multiple graph operations. Using social network data, we highlight the analytical capabilities enabled b...

متن کامل

Distributed Grouping of Property Graphs with Gradoop

Property graphs are an intuitive way to model, analyze and visualize complex relationships among heterogeneous data objects, for example, as they occur in social, biological and information networks. These graphs typically contain thousands or millions of vertices and edges and their entire representation can easily overwhelm an analyst. One way to reduce complexity is the grouping of vertices ...

متن کامل

Linkage Flooding: Ein Algorithmus zur dateninhaltsorientierten Fusion in vernetzten Informationsbeständen

Dieses Papier stellt ein spezielles Record Linkage Verfahren (Linkage Flooding) vor, das für die Suche nach Duplikaten in vernetzten Informationsbeständen optimiert ist. Nach einer kurzen Erläuterung von Anwendungsszenarien des Record Linkage sowie der Vorstellung des Record Linkage Prozesses wird der Linkage Flooding Algorithmus beschrieben und über experimentelle Ergebnisse bei der Duplikater...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Duplikaterkennung in der Graph-Processing-Platform GRADOOP

نویسنده

چکیده

منابع مشابه

GRADOOP: Scalable Graph Data Management and Analytics with Hadoop

Scalable graph analytics with GRADOOP

The Big Picture: Understanding large-scale graphs using Graph Grouping with Gradoop

Distributed Grouping of Property Graphs with Gradoop

Linkage Flooding: Ein Algorithmus zur dateninhaltsorientierten Fusion in vernetzten Informationsbeständen

عنوان ژورنال:

اشتراک گذاری